Parallel Translations As Sense Discriminators
نویسنده
چکیده
This article reports the results of a p r e hmlna ry analysis of translation equivalents in four languages from different language famdles, extracted from an on-hne parallel corpus of George Orwell's Nmeteen Eighty-Four The goal of the study is to determine the degree to which translatmn equivalents for different meamngs of a polysemous word In Enghsh are lexlcahzed differently across a variety of languages, and to detelmme whether this information can be used to structure or create a set of sense distinctions useful in natural language processing apphcatmns A coherence Index is computed that measures the tendency for different senses o1 the same English word to be lexlcahzed differently, and flora this data a clustering algorithm is used to create sense hierat chles
منابع مشابه
Memory-based Learning of Word Translation
A basic task in machine translation is to choose the right translation for source words with several possible translations in the target language. In this paper we treat word translation as a word sense disambiguation problem and train memory-based classifiers on words with alternative translations. The training data was automatically labeled with the corresponding translations by word-aligning...
متن کاملConstruction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation
Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benchmark data set for cross-lingual word sense disambiguation. The data set was created for a lexical sample of 25 English nouns, for which translations were retrieved in 5 languages, namely Dutch, German, French, Italian...
متن کاملCross-Lingual Word Sense Disambiguation
Word Sense Disambiguation using Cross-Lingual approach has been used successfully for languages like Farsi and Hindi. However, a comparable corpus in the form of Wikipedia articles available in English and Hindi has been used for such a task. This motivated us to further the approach and test the results when a parallel corpus is used. In this project, we specifically wanted to observe if the a...
متن کاملUsing Parallel Corpora for Word Sense Disambiguation
Word Sense Disambiguation (WSD) is the Natural Language Processing (NLP) task that consists in selecting the correct sense of a polysemous word in a given context. Most state-of-the-art WSD systems are supervised classifiers that are trained on manually sense-tagged corpora, which are very time-consuming and expensive to build. In order to overcome this acquisition bottleneck (sense-tagged corp...
متن کاملCross-lingual WSD for Translation Extraction from Comparable Corpora
We propose a data-driven approach to enhance translation extraction from comparable corpora. Instead of resorting to an external dictionary, we translate source vector features by using a cross-lingual Word Sense Disambiguation method. The candidate senses for a feature correspond to sense clusters of its translations in a parallel corpus and the context used for disambiguation consists of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999